36 research outputs found

    Document Similarity from Vector Space Densities

    Full text link
    We propose a computationally light method for estimating similarities between text documents, which we call the density similarity (DS) method. The method is based on a word embedding in a high-dimensional Euclidean space and on kernel regression, and takes into account semantic relations among words. We find that the accuracy of this method is virtually the same as that of a state-of-the-art method, while the gain in speed is very substantial. Additionally, we introduce generalized versions of the top-k accuracy metric and of the Jaccard metric of agreement between similarity models.Comment: 12 pages, 3 figure

    Distributed Zero-Order Optimization under Adversarial Noise

    Get PDF
    We study the problem of distributed zero-order optimization for a class of strongly convex functions. They are formed by the average of local objectives, associated to different nodes in a prescribed network. We propose a distributed zero-order projected gradient descent algorithm to solve the problem. Exchange of information within the network is permitted only between neighbouring nodes. An important feature of our procedure is that it can query only function values, subject to a general noise model, that does not require zero mean or independent errors. We derive upper bounds for the average cumulative regret and optimization error of the algorithm which highlight the role played by a network connectivity parameter, the number of variables, the noise level, the strong convexity parameter, and smoothness properties of the local objectives. The bounds indicate some key improvements of our method over the state-of-the-art, both in the distributed and standard zero-order optimization settings. We also comment on lower bounds and observe that the dependency over certain function parameters in the bound is nearly optimal

    Exploiting higher order smoothness in derivative-free optimization and continuous bandits

    Get PDF
    We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the projected gradient descent algorithm. The gradient is estimated by a randomized procedure involving two function evaluations and a smoothing kernel. We derive upper bounds for this algorithm both in the constrained and unconstrained settings and prove minimax lower bounds for any sequential search method. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters. Based on this algorithm, we also propose an estimator of the minimum value of the function achieving almost sharp oracle behavior. We compare our results with the state-of-the-art, highlighting a number of key improvements

    Iteratively regularized Newton-type methods for general data misfit functionals and applications to Poisson data

    Get PDF
    We study Newton type methods for inverse problems described by nonlinear operator equations F(u)=gF(u)=g in Banach spaces where the Newton equations F(un;un+1un)=gF(un)F'(u_n;u_{n+1}-u_n) = g-F(u_n) are regularized variationally using a general data misfit functional and a convex regularization term. This generalizes the well-known iteratively regularized Gauss-Newton method (IRGNM). We prove convergence and convergence rates as the noise level tends to 0 both for an a priori stopping rule and for a Lepski{\u\i}-type a posteriori stopping rule. Our analysis includes previous order optimal convergence rate results for the IRGNM as special cases. The main focus of this paper is on inverse problems with Poisson data where the natural data misfit functional is given by the Kullback-Leibler divergence. Two examples of such problems are discussed in detail: an inverse obstacle scattering problem with amplitude data of the far-field pattern and a phase retrieval problem. The performence of the proposed method for these problems is illustrated in numerical examples

    Variability Measures of Positive Random Variables

    Get PDF
    During the stationary part of neuronal spiking response, the stimulus can be encoded in the firing rate, but also in the statistical structure of the interspike intervals. We propose and discuss two information-based measures of statistical dispersion of the interspike interval distribution, the entropy-based dispersion and Fisher information-based dispersion. The measures are compared with the frequently used concept of standard deviation. It is shown, that standard deviation is not well suited to quantify some aspects of dispersion that are often expected intuitively, such as the degree of randomness. The proposed dispersion measures are not entirely independent, although each describes the interspike intervals from a different point of view. The new methods are applied to common models of neuronal firing and to both simulated and experimental data

    Nonparametric recursive variance estimation

    No full text
    Consider an i.i.d. sample for a random vector (X, Y)is an element of R(d) x R. We are interested in recursive estimates of the regression function f(x) = E(Y/X = x) and of the variance function Var (Y/X = x). We use recursive kernel estimates for both problems and prove pointwise mean square convergence with rates under appropriate conditions. The results are compared with nonrecursive estimators, which have recently been suggested by various authors

    Visual Error Criteria for Qualitative Smoothing

    No full text
    An important gap, between the classical mathematical theory and the practice and implementation of nonparametric curve estimation, is due to the fact that the usual norms on function spaces measure something different from what the eye can see visually in a graphical presentation. Mathematical error criteria that more closely follow ''visual impression'' are developed and analyzed from both graphical and mathematical viewpoints. Examples from wavelet regression and kernel density estimation are considered

    How Sensitive Are Average Derivatives

    Get PDF
    Average derivatives are the mean slopes of regression functions. In practice they are estimated via a nonparametric smoothing technique. Every smoothing method needs a calibration parameter that determines the finite sample performance. In this paper we use the kernel estimation method and develop a formula for the bandwidth that describes the sensitivity of the average derivative estimator. One can determine an optimal smoothing parameter from this formula which tries out to undersmooth the density of the regression variable

    Does data interpolation contradict statistical optimality?

    No full text
    © 2019 by the author(s). We show that classical learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss
    corecore